SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

نویسندگان

  • Semih Yumusak
  • Erdogan Dogdu
  • Halife Kodaz
  • Andreas Kamilaris
  • Pierre-Yves Vandenbussche
چکیده

Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. It is observed that around half of the endpoints listed in existing repositories are not accessible (i.e. offline or dead). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a ”search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis. key words: Linked Data, Semantic Web, SPARQL Endpoint, Knowledge Graph.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Federated SPARQL Queries Processing with Replicated Fragments

Federated query engines allow to consume linked data from SPARQL endpoints. Replicating data fragments from different sources allows to re-organize data to better fit federated query processing of data consumers. However, existing federated query engines poorly support replication. In this paper, we propose a replication-aware federated query engine that extends state-of-art federated query eng...

متن کامل

A Heuristic-Based Approach for Planning Federated SPARQL Queries

A large number of SPARQL endpoints are available to access the Linked Open Data cloud, but query capabilities still remain very limited. Thus, to support efficient semantic data management of federations of endpoints, existing SPARQL query engines require to be equipped with new functionalities. First, queries need to be decomposed into sub-queries not only answered by the available endpoints, ...

متن کامل

Linked Data Query Wizard: A Novel Interface for Accessing SPARQL Endpoints

In an interconnected world, Linked Data is more important than ever before. However, it is still quite difficult to access this new wealth of semantic data directly without having in-depth knowledge about SPARQL and related semantic technologies. Also, most people are currently used to consuming data as 2-dimensional tables. Linked Data is by definition always a graph, and not that many people ...

متن کامل

Tracking Federated Queries in the Linked Data

Federated query engines allow data consumers to execute queries over the federation of Linked Data (LD). However, as federated queries are decomposed into potentially thousands of subqueries distributed among SPARQL endpoints, data providers do not know federated queries, they only know subqueries they process. Consequently, unlike warehousing approaches, LD data providers have no access to sec...

متن کامل

ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints

Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-thenexecute paradigm, may timeout as a consequence of endpoint availability. Second, because block...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 100-D  شماره 

صفحات  -

تاریخ انتشار 2017